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(54) Apparatus for detecting a cut in a video 

(57) Apparatus for detecting a cut in a video com- 
prises arrangements for acquiring video images from a 
source, for deriving from tiie video images a pixel-based 
difference metric, for deriving from the video images a 
distribution-based dfference metric, and for measuring 
video content of ttie video images to provide up-to-date 
test criteria. Arrangements are included for combining 



the pixel-t>ased difference metric arxJ tiie distribution- 
based difference metric, taking into account the up-to- 
date test criteria provided so as to derive a scene 
change candidate signal and for filtering the scene 
change candidate signal so as to generate a scene 
change frame list. 
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Description 

Video is being generated at an ever-inaeasing rate everyday. In order to use information from such video reposi- 
tories efficiently and effectively, they must be properly indexed into a database. The most fundamental and important 
5 task in this process is to parse the video into an appropriate set of units, which are known as shots. Most existing 
approaches are based on preset thresholds or improper assumptions which reduce their applicability to a limited range 
of video types. 

The following description of the present invention also includes a mathematical formulation of the problem, a fast 
and robust implementation of the mathematical formulation, and a cut browsing apparatus. Performance has been dem- 
TO onstrated on over 149,000 video frames that include various video types such as sports and movies. 

Sources of videos include defense/civilian satellites, scientific experiments, biomedical innaging. fingerprinting 
devices, and home entertainment systems. In order to use information from such video repositories efficiently and 
effectively, videos must be properly indexed into a database. Video indexing provides a fast and accurate way to access 
desired video data based on Its contents. The nrx>st fundamental and important task in this process is to parse the video 
15 into an appropriate set of units, which are known as shots. 

A shot in video refers to a contiguous recording of one or more video frames depicting a continuous action in time 
and space. In a shot, camera could remain fixed, or it may exhibit one of the characteristic motion such as panning, tilt- 
ing, or tracking. For most videos, shot changes or cuts are created intentionally by video/film directors. In the early 
years, they were done on a splicer and an optical printer, while shot lists were kept on paper records, the count sheets. 
20 Since 1 985, most shot changes have been generated using modern editing machines, information on each individ- 
ual shot is retained electronically on an Editing Decision List (EDL), which can be indexed directly into a database. How- 
ever, for most videos/films that were produced before the invention of such machines, this infornriation, which was 
recorded on paper, nrtay no longer be accessible. This is certainly true for home videos and raw sports films, since cuts 
are generated by turning the camcorders/film cameras on and off. In either case, cuts will have to be detected from the 
25 video through manual or automatic means. 

Unlike what is claimed in the literature, the isolation of shots in a video is not a trivial task considering the conrtplex- 
ity of a scene and the efficacy of modern editing technologies. 

The transition from one shot to another may include visually abrupt straight cuts or camera breaks. It may also 
include such effects as fades, dissolves, wipes, flips, superimposures. freeze or hold frames, flop-overs, tall-to-head 
30 reverses, blow-ups, move-ins. repositions, and skip frames. See, for example. B. Balmuth, Introduction to Film Editing, 
Focal Press. 1989. 

Since the purpose of video indexing is to aid the retrieval of desired video dips from a database, it is important that 
a 100% recall rate and nnaximum precision rate be maintained. Here, a recall rate is defined as the percentage of the 
actual cuts that are detected and a precision rate, the percentage of the detected cuts that are actual cuts. 
35 Most existing approaches deal with very specific cases of few of the above transition types and even then a 100% 
recall rate is never achieved. In addition, they are often tested on few types of videos and few thousand video frames, 
insufficient to make any realistic judgment on the performance of individual algorithms. Most existing algorithms can not 
be implemented near video rate, imposing a serious constraint on the range of applications they can be used for. More 
importantly, they are often based on preset threshotets or improper assumptions which reduce their applicability to a lim- 
<o ited range of video types. 

Reference is also made to the following US. patent applications as containing material of dose interest to the 
present application: U.S. Patent Application Serial No. 08/221.227 filed March 31, 1994 in the name of Arman et al.; 
U.S. Patent Application Serial No. 08/221.225 filed March 31 , 1994 in the name of Arman et al.; U.S. Patent Application 
Serial No. 08/221,221, filed March 31, 1994 in the name o1 Arman etal.; U.S. Patent Application Serial No. 08/343.793 
45 filed November 22, 1 994 in the name of Arman et al. ; U.S. Patent Application Serial No. 08/346.453. filed November 29, 
1994 In tiie name of Benson et al. 

It is herein recognized tfiat a good cut detection method Is one that can 

— provide the maximum recall and precision rate for visually abrupt cuts and camera breaks. 

50 — detect cuts, from reading the video to the output of shot change frames, close or near the video rate, 

— considers the nonstationary nature of the cut detection problem, 

— have a feedback mechanism to achieve a 100% recall rate, work on a variety of videos and big number of video 
frames, and 

— is independent of the encoders and different encoding algorithms, when applied to compressed video. 

55 

In accordance with an aspect of the invention, a method for detecting a cut in a video, comprises ttie steps of (a) 
acquiring video images from a source; (b) deriving from the video images a pixel-based difference metric; (c) deriving 
from the video images a distribution-based difference metric; (d) measuring video content of the video images to pro- 
vide up-to-date test criteria; (e) combining the pixel-based difference metric and the distribution-based difference met- 
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ric, taking into account the up-to-date test criteria provided in step (d) so as to derive a scene change candidate signal: 
and (f) filtering the scene change candidate signal so as to generate a scene change frame list. 

In accordance with another aspect of the invention, the pixel-based difference metric for each frame is the summa- 
tion of an absolute frame difference representative of image intensity value at selected pixel locations in a frame. 

In accordance with another aspect of the invention, the pixel-based difference metric for each frame t is the sum of 
an absolute frame difference. 



where f fy represents the intensity value at pixel location (/.y) in frame t . 

In accordance with yet another aspect of the invention, each image is divided into a number of sub-regions and 
wherein the distribution-based difference metric is a KbImogorov-Smirnov test metric, except that one each is computed 
herein for the entire image as well as its sub-regions. 

In accordance with yet another aspect of the invention, each image is equally divided into four sub-regions and 
wherein the distribution-based difference metric is a Kolmogorov-Smirnov test metric, except that one each is computed 
herein for the entire image as well as the four equally divided sub-regions. 

In accordance with yet another aspect of the invention, the step of measuring video content of the video images to 
provide up-to-date test criteria provides the step (e) with the ability to autonr^ticalfy adjust to different video contents. 

In accordance with yet another aspect of the invention, the video images are DC images represented by the base 
frequency in the Disaete Cosine Transform coefficients characterizing the underlying image frame. 

In accordance wHh stilt another aspect of the invention, the step of measuring video content of the video images to 
provide up-to-date test criteria conrprises collecting statistics from each DC image and each pair of DC in^ges to rep- 
resent current video content, being an image contrast and the motion estimate. The image contrast estimate is com- 
puted based on a recursive scheme to suppress the influences of sudden lighting changes. 

In accordance with still another aspect of the invention, the collecting statistics from each DC image and each pair 
of DC images to represent cun^ent video content represent an image contrast estinr^te and a nrx>tion estimate. 

In accordance with still another aspect of the invention, the image contrast estimate is computed based on a recur- 
sive scheme to suppress the influences of sudden lighting changes. 

In accordance with stilt yet another aspect of the invention, the image contrast estimate is derived in accordance 
with the following: 

cor7frasf,a(1-t)confrasf f .1 +Ta 

where is the intensity variance estimate of the DC image at time M . 

In accordance with still yet another aspect of the invention, the image contrast estimate equals 0.6. 
tn accordance with a further aspect of the invention, the motion estimate is computed as follows: 

motion motion f^y-^x^ { ) 

where f]j^ is tiie intercity value at pixel location (/'.y') of the DC image at time M . N is the size of the image. 
In accordance with anotiier aspect of the invention, x equals 0.6. 

In accordance with yet a further aspect of the invention, the image contrast and tiie motion estimates are applied 
to a fuzzy engine to compute a new significance level for the hierarchical Kblmogorov-Smirnov test, the fuzzy engine 
using a quadratic menrt>ership function, where each contrast measurement is divided into classes, from low to high, and 
each motion estimate is divided into classes, from slow to fast, and each significance level is divided into classes from 
high to low. 

In accordance with yet a further aspect of the invention, each contrast measurement is divided into four classes, 
low, middle, high, and extremely high, each motion estimate into three classes, slow, middle. arxJ fast, and each signif- 
icance level into five classes, high, middle high, middle, middle low. and low, and wherein the fuzzy rules are stated in 
a simple IF/THEN fornoat. where values are combined using AND (minimum) or OR (maximum) operations. 

In accordance with yet a further aspect of the invention, a method for detecting a cut in a video includes a step of 
defuzzifying the fuzzy rules to yield a crisp final output value, by finding the center of gravity of tiie combined output 
shape, whereby all rules are ensured of contributing to the final crisp result. 



EP0 780 776 A1 



In accordance with still yet a further aspect of the invention, in the step (e) of connbining the pixel-based difference 
metric and the distribution-based difference metric, taking into account the up-to-date test criteria provided in step (d) 
so as to derive a scene change candidate signal, the pixel-based difference metrics are treated as time series signals, 
where both visually abrupt cuts and duplication of frames create observation outliers. 
5 In accordance with still yet a further aspect of the invention, the pixel-based difference metric is treated as a time 
series signal, where both visually abrupt cuts and the duplication of frames create obsen/ation outliers obeying the 
equation 

10 

^t'^i^t-r^t-r+)' ' ' " - ^f) othenwise 
where f represents the time index, A is the outlier, f{df,^,df,^^^,* • •,df) models the trend in the series, and 

15 p 

r«1 



20 In accordance with still a further aspect yet of the invention, apparatus for detecting a cut in a video, comprises: (a) 
apparatus for acquiring video images from a source; (b) apparatus for deriving from the video images a pixel-based dif- 
ference metric; (c) apparatus for deriving from the video images a distribution-based difference metric; (d) apparatus for 
measuring video content of the video images to provide up-to-date test criteria; (e) apparatus for combining the pixel- 
based difference metric and the distribution-based difference metric, taking into account the up-to-date test criteria pro- 

25 vided in step (d) so as to derive a scene change candidate signal; and (f) apparatus for filtering the scene change can- 
didate signal so as to generate a scene change frame list. 

In accordance with still another aspect, the invention includes apparatus for detecting a cut in a video in accordance 
with claim 20, including apparatus for presenting two cross-section images of the video images, a horizontal cross sec- 
tion image one is in a horizontal direction and a vertical cross section image in a vertical direction of the video volume. 

30 In accordance with yet another aspect of the invention, each aoss-section image is constructed by sampling one 
row (or column) from every image, and reducing the amount of inforn^tion from a two-dimensional image to two one- 
dimensional image strips. 

In accordance with still yet another aspect of the invention, the horizontal and vertical cross-section images are 
combined into one image segmented into two bands according to a list of detected shots, whereby a level of abstraction 
35 is presented that is just enough to reveal whether there is a missed or misdetected shot. 

The invention will be wore clearly understood from the following detailed description of preferred embodiments, in 
conjunction with the Drawing, in which 

Figure 1 shows that a traditionally used, prior art. cut detection algorithm can be categorized as a process compris- 
40 ing four steps, data acquisition, difference metric collection, detection, and decision, wherein the dashed boxes 
indicate optional steps which can be found in some algorithms and wherein the delay circuit is a mechanism to uti- 
lize information from both the past and the future frames; 

Figure 2 shows four types of 3:2 pulldown: 

45 

Figure 3 show a histogram of a typical inter-frame difference image that does not correspond to a shot change; 

Figure 4 shows browsing apparatus in accordance with the invention; 

so Figure 5(a) shows two aoss sections in video volumes and Figure 5(b) shows a space-time image made up of two 
cross sections, useful in understanding the invention; 

Figure 6 shows examples of cross section patterns In accordance with the invention, wherein Figure 6(a) shows 
visually abrupt cuts and Figure 6(b) shows gradual shot transitions, in black and white, and Rgure 6(c) and 6(d) 
55 b eing the same figures in color. 

Figure 7 shows a flow diagram of a cut detection algorithm in accordance with the invention; 

Figure 8 shows a sub flow diagram of the difference metric collection step in accordance with the invention: 
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Figure 9 shows a sub flow diagram of the adaptation step in accordance with the irtvention; 

Figure 10 shows the membership function of contrast estimate in accordance with the invention: 

5 Figure 1 1 shows the membership function of motion estimate in accordance with the Invention; 

Figure 12 shows the membership function of which is proportional to significance level In accordance with the 
invention; 

10 Figure 1 3 shows a sub flow diagram of the detection step in accordance with the invention; 

Figure 14 shows an exarrple of trend estimation, wherein Figure 14(a) shows that a type I observation outlier is 
detected and Figure 14(b) shows that a type II observation outlier is detected; and 

15 Rgure 1 5 shows cut browser apparatus incorporating the invention. 1 5(a) being in black and white and 1 5(b) being 
the same figure in color. 

This present invention relates to an automatic cut detection method and a cut browsing apparatus that satisfy all 
criteria listed above. The cut detection is formulated as a non-stationary time series outlier detection problem. 

20 The efforts on cut detection can be traced back to 0. Colt and Q. Choma, * ' Image Activity Characteristics in Broad- 
cast Television." IEEE Trans, on Communications. Vol. 26. pp. 1201-1206. 1976, where the authors perform an exten- 
sive experimental study of frame difference signals using four different types of vkleos. a football game, a drama show, 
a talk show, and a cartoon. The authors divide each uncompressed video frame into 8xd blocks, each of which is rep- 
resented by its average grayscale value. In their study, the average of the magnitudes of the differences between cor- 

25 responding blocks of consecutive frames is used as the difference metric. Coll and Choma show that a threshold 
determined experimentally can detect shot changes that nr^tch the accuracy of a human observer, ^4ore scene change 
detection algorithms based on uncompressed video can also be found in the computer vision literature. See for exam- 
ple, I. Sethi, V. Salarl. and S. Vemuri. ' Image Sequence Segmentation using Motion Coherence," Proc. First Interna- 
tional Conference on Computer Vision), pp.667-671, 1987; Y. Z. Hsu. H.-H. Nagel. and G. Pekers. ' New Likelihood 

30 Test Methods for Change Detection in Image Sequences," CVGIP. 26, pp. 73-106, 1984. I. Sethi. V. Salari. and S. 
Vemuri, ' 'Image Sequence Segmentation using Motion Coherence," Proc. First International Conference on Computer 
Vision, pp.667-671. 1987. 

Digital vkieo can be stored and transmitted in compressed form. See Joint Photographic Experts Group (JPEG); 
ISO/IEC JTC1 SC29 WGl. JPEG}. ISO/IEC 10 918; Moving Picture Experts Group (MPEG) ISO/IEC JTC1 SC29 
35 WG11 , MPEG- 1 . ISO/IEC 1 1 1 72. and MPEG-2, ISO/IEC 1 3 81 8, and International Telegraph and Telephone Consult- 
ative Conrmittee (CCITT) CCITT. Recommendation H.261. Video Codec for /Vudiovisual Services at px64 kbits/s, Dec. 
1990. 

Performing cut detection as well as image processing on compressed video saves unnecessary decompression- 
conrpressk>n effort. This idea led to many efforts in pursuing solutions that can process compressed video directly, as 

40 disclosed by F Arman et al. in U.S. patent applications serial No. 08/221,227 and 08/221.227, presently pending. 
Arman et al. have developed a scene change detection algorithm for JPEG and movie-JPEG video, where a subset of 
Discrete Cosine Transform (DCT) coefficients is used to characterize the underlying frame. Unfortunately, full DCT coef- 
ficients are difficult or impractical to obtain in MPEG or H.261 video without full scale decoding. This is because motion 
vectors are quantities in the spatial domain, while DCT coefficients are quantities in the frequency domain. 

45 Other researchers have proposed to use motion vectors directly either to filter out scene change frames or to detect 
scene cuts See. for example. H.-C. H. Liu and G. L. Zidc, ' Scene Decomposition of MPEG Compressed Video." SPIE 
Vol. 2419, Digital Video Compression Algorithms and Technologies, pp.26-37. 1995; J. Meng, Y Juan, and S.-F Chang. 
* 'Scene Change Detection in a MPEG Compressed Video Sequence." SPIE Vol. 2419. Digital Video Compression 
Algorithms and Technologies, pp.1 4-25. 1995; and H. Zhang. C. Y Lxw, and S. W. Smoliar. ' 'Video Parsing and Brows- 

50 ing Using Compressed Data." Multimedia Tools and Applications, 1 . pp.89-1 11,1 995. 

They are often based on the ratio of number of fbnward-predicted macro blocks and the total number of macro 
blocks. Since there Is no standard criterion in determining whether a certain maao block should be inter-coded (tem- 
porarily predicted from a previous reconstructed picture) or intra-coded (tike a baseline JPEG picture) during the encod- 
ing process, this approach is very sensitive to different encoders and types of encoding algorithms. 

55 Most cut detection algorithms in the literature, whether or not they take uncompressed or compressed video data, 
can be categorized, as a process comprising four steps: data acquisition, difference metric collection, detection, and 
decision, as illustrated in Figure 1. The dashed boxes indicate optional steps, which are found in some algorithms. The 
delay circuit is a mechanism to use information from both past and future frames. 

During the data acquisition stage, compressed video based approaches often extract DCT coefficients or motion 
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vectors. See. for example, the above-cited application of Arman et al.; H. fshikawa and H. Matsumoto, * 'Method for 
Detecting a Scene Change and Image Editing Apparatus." European Patent 0-615-245-A2, filed July 3. 1994; H.-C. H. 
Liu and G. L. Zick, ' Scene Decomposition of MPEG Compressed Video." SPIE Vol. 2419, Digital Video Compression 
Algorithms and Technologies, pp,26-37, 1995. There are a total of 64 OCT coefficients, including one DC (the base fre- 

5 quency in the DCT coefficients) term and 63 AC (the higher frequencies in the OCT coefficients) terms. The amount of 
decoding varies anrtong different algorithms. Some only decode DC terms, see. for example, J. Meng, Y. Juan, and S.- 
R Chang, ' Scene Change Detection in a MPEG Compressed Video Sequence," SPIE Vol. 2419. Digital Video Com- 
pression Algorithms and Technologies, pp. 14-25, 1995. while others only extract the number of fonward-coded and 
backward-coded motion vectors, see, for example. H.-C. H. Liu and G. L. Zick, ' Scene Decomposition of MPEG Com- 

w pressed Video," SPIE Vol. 2419. Digital Video Conpression Algorithms and Technologies, pp.26-37. 1995. 

Most such algorithnns try to avoid full scale decoding in order to be computationally efficient For approaches that 
use uncompressed video data, each digitized image is often smoothed or sub-sanrpled; see, for example. P. Aigrain and 
P. Joly. ' The Automatic Real-Time Analysis of Film Editing and Transition Effects and its Applications." Computer and 
Graphics, Vol. 18, No. 1. pp.93-103, 1994. 

15 Once video data is acquired, except for those algorithms that use motion vector infornnation explidtty. most in the 
literature collect one difference metric per frame. Such difference metric, in general, can be categorized into pixel-based 
and distribution-based types. 

In the pixel*based category, the most popular difference metric is the sum of squared difference, namely 
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25 where represents the intensity value at pixel location (i, j) in frame t. An alternative choice for less computation is the 
sum of absolute difference 



M. 



Both difference metrics are sensitive to large camera moves, leading to numerous false positives in cut detection. To 
overcome this problem, many researchers either smooth video frames, or down sample them, before taking the pixel 

35 based differences. This additional step often improves the precision rate of cut detection algorithms significantly 

Distribution-based difference metrics, on the other hand, are less influenced by camera moves. A likelihood meas- 
ure was first proposed in the late 70's based on the intensity mean and variance of the current and the previous frame. 
See Y Z. Hsu. H.-H. Nagel, and G. Pekers. * New Likelihood Test Methods for Change Detection in Image Sequences," 
CVGIP, 26. pp. 73-106. 1984. If \Xf and <jf are used to represent the intensity mean and variance of the frame t respec- 

40 lively, the likelihood based difference metric can be defined as 

+( 5 ) 1 

dt- ^ ^ (1) 



Other existing distribution-based difference measure are based on histograms. The image histograms of the frame 
is denoted by H( , Let Ht(J) be the pixel counts in bin j of the histogram Hf . If the total number of bins is N, one can 
compute either: 

. ^ =H,(y)-H,-10)!^ 



or 
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Either one of them can be used as a distribution-based difference metric. The bin size N in either measure affects 
the sensitivity of the cut detection algorithm. The smaller the bin size is, the more sensitive the histogram measure will 
be. In most existing methods, a fixed bin size is used throughout all intensity ranges. 

Recently, a third type of distribution*based difference metric has been proposed. See I. K. Sethi and N. Patel, ' A 
Statistical Approach to Scene Change Detection." SPIE Vol. 2420, Storage and Retrieval for Image and Video Data- 
bases ill. pp. 329-338, 1995. This metric is based on the empirical distribution function (EDF) of the previous and cur- 
rent frames. It is called the Kbimogorov-Smirnov test metric. 

df^tr)ax^EDF,U)'EDFf.^{j)\ (4) 

In order to compute the empirical distribution function, one first constructs the histogram of each individual video 
frame. Assume that the histogram for frame t is represented by 

{ H,(y)!/=1. A/ } . EDFfij) , which represents the cumulative distribution probability of the jth intensity value in frame t, 
can then be defined by 

EDF.U^EDF.U'^h'j^. j = 2.N 

where M is the total number of pixels in the image. 

After the difference metric is collected for each frame, most algorithms simply apply a preset threshold to this tin^ 
series of difference metrics. The algorithm compares the value of difference metric with a global threshold. If it is above 
this preset nunriser, a shot change is signaled. Since it is difficult to find such a threshold for an entire video, some 
researchers have also proposed the use of two thresholds. See. for example, H. Zhang, A. Kankanhalli. and S. W. Smo- 
liar, ' Automatic Parsing of Futl-Motion Video." ACM Multimedia Systems. 1. pp.10-28. 1993. 

Recently, rank-based detection schemes have become very popular. See, for example. P. Aigrain and R Joly, ' 'The 
Automatic Real-Time Analysis of Film Editing and Transition Effects and its Applications," Conputer and Graphics, Vd. 
18. No. 1. pp. 93-103. 1994. 

Since global thresholds are hard to find, the idea is that thresholds should be applied only to difference metrics in 
a local temporal neighborhood. TTiis local temporal neighborhood approach is indicated as the dashed box attached to 
the detection step in Figure^ 1 . In order for this to work, a new difference metric is conrputed in the detection step from 
every local temporal window centered around the current frame. Denote dt as the difference metric for frame t in ttie 
form of sum of squared differences (or sum of ak>solute differences) and assume that the size of the t^poral window 
is2N+1. lfthe2N+1 observations d, . /«f-/\/.f-i-/V are ordered d^<cl2<* • •<fl^2A/+i , the new metric c( , is often com- 
puted in one of the following three ways. 

^ df^O if df^d^f^^^: Eq.5 

df^^4^ otherwise 

or 

df^O if d^d^NW* 
d f~d 2M^^ " ^ 2 A/ otherwise 

or 

df=0 if df^dzN^v Eq7 
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f" 2N 



Otherwise 



1^1 



The preset threshold wilt then be applied to this sequence of new metrics d t • Most algorithms that use Equation 
5 or Equation 7 favor N s 2 and a number between 3 and 4 as the best threshold choice. 

All detection schemes discussed so far are based on preset thresholds and they do not treat cut detection as a 
hypothesis testing problem. On the other hand, there are a few cut detection algorithms, that are based on hypothesis 
testing paradigms. All such algorithms can be categorized into two different types. The first formulation views a 
sequence of difference metrics as a set of sannples drawn from a single known distribution, if they are derived from 
images within the same shot The second formulation views them as a set of samples drawn from a single but unknown 
distribution. Note here that the difference metric could be confuted from each individual frame or from any two neigh- 
boring frames. These formulations transform the cut detection problem into the following two problems respectively: 

Case 1 : Can one disprove, to a certain required level of significance, the null hypothesis that difference metric is a 
sample drawn from a known distribution? 

Case 2: Can one disprove, to a certain required level of significance, the null hypothesis that the previous and the 
current difference metric are samples drawn from the same distribution? 

In both cases, either pixel-based or distribution-based difference metrics can be used. However, the underlying dis- 
tribution in the second case does not have to be known in advance. Each formulation will next be explained in more 
detail. 

Regarding Case 1 . work on modeling frame difference signals can be traced back to the 1960's. Seyler first studied 
the nature of frame difference signals and showed that the gamma distribution provides a good fit to the probability den- 
sity function of the fraction of pixels whose change is above a threshold. See A. J. Seyler. "Probability Distributions of 
Television Frame Differences." Proceedings of I.R.E.E. Australia, pp. 355-366. November 1965. An analytic expression 
is derived for the probability density functions approximating the experimentally recorded frame difference distributions 
and is simply the fbltcwing: 



where D is the fraction of pixels whose change is above a threshokJ and the parameters a and p are given by the com* 
puted mean and standard deviatk>n 07) of the experimental distribution, as 



It is however not dear how sensitive this conclusion is to the choice of thresholds. 

An alternative modei for frame difference signals is that the corresponding difference c/,y(=/J-^J"^) at pixel location 
{i.j) follows a zero-mean Gaussian distribution with variance a| . The unknown parameter a,y can be estin^ted from 
the innage sequence directly To simplify the model, it is often assumed that the random variables are i.i.d. (independ- 
ently identically distributed), therefore 



Following this, it is shown in T Aach. A. Kaup. and R. Mester, ' Statistical Model-Based Change Detection in Moving 
Video." Signal Processing, 31, pp. 165-180. 1993 that 



/7(D)=(r(a)p")''0°-'exp(-J) 





for all VJ) 
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obeys a distribution with as many degrees of freedom as the number of pixels in the image. 
djj can also be modeled as obeying a Laplace distribution, i.e., 

p(dij)=^exp[-y\dif\ 

where 



70 




and is the variance of this probability density function. It is then shown in Aach et al. op. cit. that 
IS piZtydjfl follows a distribution with twice as many degrees of freedom as the number of pixels in the image. 

Annong distribution-based difference metrics, the difference metric constructed using equation (1) is based on the 
assumption that images are samples drawn from N(\if,<Jt) . This metric is essentially the maximum likelihood ratio of 



where /.(x) is tiie likelihood of event x and 
25 Hoiframes M and t come from the same disti-ibution N{\i^.^ .cr^i) :frames M and t come from different distributions 
/^(MM.ofM)and A/{^,.af). 

For the difference metric constructed using equation (2) and (3), it can also be shown that both satisfy ttie x^ dis- 
tribution. See J. D. Gibbons and S. Chakrabortl. Nonparametric Statistical Inference. Marcel DekKer. Inc., 1992 The x^ 
test detects whether an observed frequency distritxjtion in a sample arise from the assumed theoretical distribution. 
30 Such distritxitions couki be a true binomial. Poisson. normal, or some known type of distribution in the population. Usu- 
ally, the parameters of this disti'ibution is not known. It nrvty be shown that, if $ parameters are estimated by the method 
of maximum likelihood, the limiting distribution of 



is that of x^ with A/-S-1 degrees of freedom. 
40 For Case 2, the nnost comrrwnly used statistical approach of this type is the Kolmogorov*Smirnov test. The Kbl- 
mogorov-Snriirnov test is concerned with tiie agreement of two sets of observed values and the null hypothesis is that 
two samples come from populations with the same distribution function F(x) . It can be shown that the test statistic dt 
as in Equation (4) satisfies 

45 12 

Prob{d,>K^j^)^a 

where represents a constant depending on the level of significance a and is available in tabular form from most sta- 
50 tistics books. N is the size of the image the statistic is collected upon. 

Unlike the test discussed just above, the Kolmogorov-Smirnov test does not assume any it a priori information 
about the type of distribution functions. 

The decision stage is often set up to eliminate detected cuts that are false positives due to flash light or slowest 
motion in the scene. The list of detected cut frames is stored in a delay memory as shown by the dashed box attached 
55 to the decision step in Figure t The criterion usually states that the minimum distance between two cuts has to be 
greater than a preset number. 

Most existing approaches are based on preset thresholds or improper assumptions which reduce their applicability 
to a limited range of video types. For example, incon-ect assumptions have frequently been made atx)ut how shots are 
connected in vkjeos. ignoring the realities of how films/videos are produced and edited. The nonstationary nature of tiie 
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cut detection problem has often been ignored, as well as the fact that many videos are converted from films In the 

design of their cut detection algorithm. 

In the following section, the cut detection problem will be examined from several different viewpoints. Since most 

shots and shot changes are aeated intentionally by film/video directors, it is very important to understand techniques 
5 commonly used by film/video production and editing professionals. This will provide an idea what types of shots and 

cuts could be present in films/videos. The commonly used techniques to convert films to videos will also be reviewed. 

This understanding is very important for developing a robust cut detection algorithm. Finally, different assumptions 

made by existing methods will be explained and a solution in accordance with the invention will be described. 

In the field of film/video production, a few shot classifications are established for directors, camera operators, and 
10 editors so that they can use the same language when talking about particular shots. There are three typical shots 

present in most films/videos. 

— static shots, which are taken without moving the camera 

15 The various types of static shots fall into five categories: 

dose-up, close, medium, full. ar»d long. Different static shots induce different amounts of change from frame to 
frame. For example, an object's movement in a close shot will produce more significant changes in conrparison with a 
medium or long shot. 

20 — camera moves 

The various types of camera moves include zooms, tilts, and pans, or a mixture of the above. The change from 
frame to frame is a function of the speed at which camera moves. For example, a fast camera pan will induce more 
significant changes in comparison with a slow camera pan. 

25 — tracking shots, where the camera is moved during the shot 

The most popular tracking shots include walking shots, dolly shots, and crane shots. All tracking shots have the ten- 
dency to induce unsteadiness, in particular when the object being tracked moves too fast. It is obvious that the 
amount of change from frame to frame is also a function of how steadily the camera tracks the object. 

30 Since camera/object nnotion in different shot types result in different amounts of intensity change, the criteria to 
detect cuts should be different when different types of shots are processed. Othenvise. many false positives and false 
negatives could occur and thus reduce the recall and the precision rate. Unfortunately, in the type of problem this inven- 
tion disclosure addresses, shot types are not known in advance. How to adjust the detection criteria to different shot 
types becomes one of the most inrtportant tasks In cut detection. 

35 Another important issue to be looked at is the fact that many videos are converted from films, because the two are 
played at different frame rates. A comnrK)n conversion process, called a 3-2 pull down, is to make every other film frame 
a little bit longer. This process is often performed by a telecine machine and may affect many cut detection algorithms. 
As shown in Figure 2, there are actually four different ways that the 3:2 transfer can take place, as follows. In Figure 2. 
W. X. Y, Z are four film frames, each of which contains of two fields. By manipulating these f ieki Images, one can con- 

40 Struct five video frames, (upper left) starting on field 1. alternating 3 fields (upper right) starting on field 1 . alternating 2 
f iekjs (lower left) starting on field 2. alternating 3 f iekjs (lower right) starting on field 2. alternating 2 fields. 
Thus: 

1 . starting on field 1. alternating 3 fields, then 2..3..2..3..2. 
45 2. Starting on field t. alternating 2 fields, then 3..2..3..2..3. 

3. starting on field 2,. alternating 3 fields, then 2..3..2..3..2. 

4. starting on field 2. alternating 2 fields, then 3..2..3..2..3. 

In any case, certain video frames are made up of two fields with totally different (although consecutive) pictures in 
so them. As a result, the digitizer can only take one fieW from each video frame in order to maintain good picture quality. 
This will result in duplicated frames and alnfx>st zero inter-frame differences at five frame intervals in all four cases of 
3:2 pulldown. For cut detection methods that are based on pixel-based difference metrics, this may lead to numerous 
false positives. That is the reason why many existing algoritinms based tiieir detection on difference metrics collected in 
a five or seven frame interval centered around the current frame. See. for example, P. Aigrain and P. Joly. * 'The Auto- 
55 matic Real-Time Analysis of Film Editing and Transition Effects and its Applications." Computer and Graphics. Vol. 18. 
No. 1 . pp.93-103, 1994; H. Dubner, ' 'Video Logging System and Method Thereof," International Patent Classification 
H04N9/79,5/76. Application Filed November 1993; T. Koga. * Accurate Detection of a Drastic Change Between Suc- 
cessive Pictures." US Patent 5,032,905. July 16. 1991. A similar problem occurs in animated videos such as cartoons 
except it produces almost zero inter-frame differences in as often as every other frame. 
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Most shot transitions are created by film/video directors. Before the invention of modern editing machines, it Is often 
done on a splicer and an optical printer. The following are the most common shot transition artd special effect types that 
can be created by such devices: 

— fade in, the incoming scene gradually appears, starting from dark 

— fade out. the outgoing scene gradually disappears, ending in black 

— dissolve, comprises an outgoing scene, which fades out. and an incoming scene, that fades in and overlaps the out- 
going scene 

— the irrcoming scene is used to literally wipe off the outgoing scene by means of a hard-edged or soft-edged line, as 
the incoming scene becomes solely and completely visible 

— flip, a widening black stripe either on both sides or on both tops of the picture frame, that squeezes out the outgoing 
scene and simultaneously brings in the incoming scene, thus creating an extremely fast turnover, or revolving, 
effect 

— freeze frame), a particular frame is selected to repeat for any desired length for dramatic or comedic effect 

— flop-over, an effect often used to correct misdirection 

— tail-to-head reversal, the tail, or end. of the cut becomes the head and the head, the tail. 

— skip frames, the elimination of frames to speed up tfie action 

— double printing, the duplicate of frames to slow down the action 

— blow up, the blow up of frames to eliminate certain unwanted objects 

— move-in, ending in a dissolve to another moving shot 

— reposition, adjusting the original angle of a shot 

— superimposure, the overlapping of two or nx)re scenes which, unlike those in a dissolve, are simultaneously of con- 
stant relative strength. 

— montage, any sequence of superimposures 

Modern computer technology allows even more ways to create shot transitions. The following are some examples. 

— band wipe, the incoming scene is revealed under the outgoing scene by horizontal or vertical bars 

— barn doors, the incoming scene is revealed under the outgoing scene from the center outwards 

— center merge, the outgoing scene splits into four parts and slides to the center to reveal the incoming scene 

— center split, the outgoing scene splits into four parts and slides to the corner to reveal the incoming scene 

— checkert)oard. two sets of alternating boxes wipe to reveal the incoming scene cross stretch, the incoming scene 
stretches from an edge as the outgoing scene shrinks 

For transitions made through visually abrupt straight cuts or camera breaks, the cut detection problem is a relatively 
well-defined task. However, detecting other types of transitions such as fades, dissolves, wipes, flips, superimposures. 
freeze or hoki frames. fk)p-overs, tail-to-head reverses, blow-ups. move-ins. repositions, and skip frames, may not be 
as straightfonward. As. a matter of fact. nrx>st scene transition can be any length an editor wishes, as long as it is con- 
sistent with the style, nxxx:!, and pacing desired at that moment in the picture. Therefore, the collected difference met- 
rics associated with these shot transitions are often indistinguishable^ from those associated with gradual camera 
moves, unless semantic level information such as the image motion pattern is also considered. See. for example, H. 
Zhang. A. Kankanhalli. and S. W. Smoliar. ' * Automatic Parsing of Full-Motion Video." ACM Multimedia Systems. 1 , pp. 
10-28, 1993. 

Many existing methods try to nrKXiel various varieties of shot transitions. See, for example. P Aigrain and P Joly, ' 
*The Automatic Real-Time Analysis of Film Editing and Transition Effects and its Applications," Computer and Graphics. 
Vol. 18, No. 1, pp. 93-103, 1994; A. Hampapur. R. Jain, and T. Weymouth. ' ^Digital Video Segmentation," Proc. ACM 
Multimedia Conference, pp. 357-363, 1994; J. Meng, Y Juan, and S.-F Chang, ^ Scene Change Detection in a MPEG 
Compressed Video Sequence." SPIE Vol. 2419. Digital Video Compression Algorithms and Technologies, pp. 14-25. 
1995; JPEG ISO/IEC JTCl SC29 WQ1. JPEG, ISO/IEC 10 918: T Koga. ' ^Accurate Detection of a Drastic Change 
Between Successive Pictures," US Patent 5.032,905, July 16, 1991; MPEG ISO/IEC JTCl SC2 . B.-L Yeo and B. Uu, 
' 'Rapid Scene Analysis on Conrpressed Video," To appear in IEEE Trans, on Circuits and Systems for Video Technol- 
ogy. 1995. They often assume tiiat both tiie incoming and outgoing shots are static scenes and the trar^ition only lasts 
no longer than half a second. This type of model is too simplified to model gradual shot transitions that are often present 
in filnrtsA^ideos. 

Another assumption researchers often make is that the frame difference signal computed at each individual pixel 
can be modeled by a statk}nary independently identically distributed random variable which obeys a known probability 
distribution such as the Gaussian or Laplace. See. for example. H. Zhang. A. Kankanhalli. and S. W. Smoliar, ' Auto- 
matic Parsing of Full-Motion Video." ACM Multimedia Systems. 1. pp.10-28, 1993. This assumption is generally not 
true, as shown in Figure 3 which shows ttie histogram of a typical inter-frame difference image that does not correspond 
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to a shot change. Note that the shape changes as camera moves (left) slowly versus (right) fast. Neither a Gaussian 
nor a Laplace fits both curves well. A Gamma function fits the left curve well, but not the right curve. 

This fact invalidates cut detection approaches that are based on tests since these tests are derived statistically 
from the above assumption. See, for example, G. Casella and R. L. Berger. Statistical Inference, Duxbury Press. 1 990. 
5 In addition, existing methods assume that time-series difference metrics are stationary, completely ignoring the fact 
that such metrics are highly correlated time signals. 

It is herein recognized that pixel-based and distribution-based difference metrics respond differently to different 
types of shots and shot transitions. For exanple, the former is very sensitive to camera moves, but is a very good indi- 
cator for shot changes. On the other hand, distribution-based metrics are relatively insensitive to camera and object 
10 motion, but can produce little response when two shots look quite different but have similar distributions. It is an object 
of the present invention to combine both measures in cut detection. 

Unlike existing methods which have no notion of time series nor non-stationarity. the present invention treats a 
sequence of difference metrics as nonstationary time series signals and models the time trend determlnistically. The 
sequence of difference metrics, no matter how they are computed, are just like any economic or statistic data collected 
75 over time. In this view, shot changes as well as the 3:2 puit-down process will txTth create observation outliers in time 
series, while the gradual shot transition and gradual camera moves will produce innovation outliers. Fox defines the 
observation outlier to be the one that is caused by a gross error of observation or recording eror and it only affects a 
single observation. See A. J. Fox. ' Outliers in Time Series," Journal of the Royal Statistical Society. Series B, 34. pp. 
350-363. 1972. 

20 Similarly, the innovation outlier is the one that corresponds to the situation in which a single ^ innovation" is 
extreme. This type of outlier affects not only the particular observation but also subsequent observations. A typical 
model that represents observation outliers (occurs at t^q ) is 

dt^f{df.^dt.r^^,* ^ ',df)-^Uf if t^q Eq. 8 

25 

df=f{df,f,df,^^^,* • vc^fl+Wf+A otherwise 
where 1 1 represents the time index, A is the outlier, /(df .^^.^ , • • • . c/,) models the trend in the series, and 
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Eq.9 



35 In Equation (9). a^. are autoregressive parameters and the [Zf] are independently A/(0.a| ) (zero mean normal dis- 
tribution wtth variance a| ). 

The nnodel for innovation outliers ts 

^ d,^f{d,.,d,.,,,.-".d,hj;^a,d,.,*^,*z, Eq.10 

/■-I 



where arxJ [Zf] are defined as for equation (9) and the outlier affects df and, through rt. subsequent observations 

There are standard methods in the literature which detect both outliers. See, for example, B. Abraham and A. 
Chuang. ' 'Outlier Detection and Time Series Modeling,** Technometrics, Vol. 31, No. 2, pp.241 -248, May 1989; A. J. 
Fox. ' 'Outiiers in Time Series/ of the Royal Statistical Society, Series B. 34, pp. 350-363, 1972; L. K. Hotta and M. M. 
C. Neves, ^ A Brief Review of Tests for Detection of Time Series Outiiers." ESTADISTICA, 44, 142. 143, pp. 103-148, 
50 1992. 

These standard methods, however, can not be applied to the cut detection problem directly as yet for the following 
three reasons. First, most metiiods require Intensive computation for example, least squares, to estimate time trend and 
autoregressive coeffictertts. This amount of computation is generally not desired. Second, the observation outliers cre- 
ated by slow motion and the 3:2 pujl-down process could occur as often as one in every otiier sample, making the time 
55 trend and autoregressive coefficient estimation an extremely difficult process. Finally, since gradual shot transitions and 
gradual camera moves are indistinguishable in most cases, location of gradual shot transitions requires not only detec- 
tion of innovation outliers but also an extra camera motion estimation step. 

In the solution in accordance with the present invention, a zero-th order autoregressive model and a piecewise-lin- 
ear function are used to model the time trend. With this simplification, samples from both the past and the future must 
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be used in order to improve the robustness of time trend estimation. More than half of samples are discarded because 
the observation outliers created by slow motion and the 3:2 pull-down process could occur as often as one in every 
other sample. Fortunately, these types of observation outliers are least in value, and therefore could be identified easily. 
After the time trend is removed, the remaining value is tested agair»st a normal distribution A/(0.a) . which a can be 

5 estimated recursively or in advance. 

To make the cut detection method more robust, a further application is made of the Kolmogorov-Smirnov test to 
eliminate false positives. This test is chosen because it does not assume a priori knowledge of the underlying distribu- 
tion function. The traditional Koimogorov-Smlrnov test procedure compares the conputed test metric with a preset sig- 
nificance level (normally at 95%). It has been used by some researchers to detect cuts from videos. See, for exanrple. 

70 I. K. Sethi and N. Patel. ' A Statistical Approach to Scene Change Detection,** SPIE Vol. 2420. Storage and Retrieval 
for Image and Video Databases III. pp. 329-338, 1995. This use of single preselected significance level completely 
ignores the nonstationary nature of the cut detection problem. It is herein recognized that the Kblmogorov-Smirnov test 
is properly used only if it takes into account the nonstationary nature of the problem. In other words, the significance 
level should be automatically adjusted to different types of video contents. 

75 One way to represent video contents is to use measurement in both the spatial and the temporal donrtain together. 
For example, image contrast is a good spatial domain measurement, because the amount of intensity changes aaoss 
two neighboring frames measures video content in the temporal domain. The adjustment should be made such that, 

' the higher the image contrast is (that is, the less the significance level), the more sensitive the cut detection mech- 
20 anism should be, and 

the more changes occur in two consecutive images, the less sensitive the detection mechanism (that is. the higher 
the significance level) should be. 

The traditional Kolmogorov-Smirnov test also can not differentiate the long shot from the close up of the same 
25 scene. To guard against such transitions, the present invention utilizes a hierarchical Kolmogorov-Smirnov test. In this 
hierarchical Kolmogorov-Smirnov test, each frame is divided into four rectangular regions of equal size and the tradi- 
tional Kolmogorov-Smirnov test is applied to every pair of regions as well as to the entire image. This test therefore pro- 
duces five binary numbers which indicate whether there is a change in the entire image as welt as in each of the four 
sub-images. 

30 Finally, instead of directly using these five binary nun^ers to eliminate false positives, the test result is only used in 
a qualitative sense. The significance in test result of the shot change frame is compared against that of its neighboring 
frames. These modifications will become better understood in a later portion of the description. 

Despite many claims and attempts in the literature regarding detecting gradual shot changes, it is not possible to 
differentiate gradual shot transitions from gradual camera moves, since difference metrics associated with both cases 

35 are often indistinguishable. Zhang et al. present a hybrid algorithm that tries to differentiate gradual shot transitions from 
gradual camera moves based on camera motion analysis. See H. Zhang. C. Y. Low. and S. W. Smoliar. ' Video Parsing 
and Browsing Using Compressed Data." Multimedia Tools and Applications. 1 . pp.89-1 1 1 . 1995. However, it should be 
noted that camera nwtion estimation has been an active area, of research in computer vision for more than a decade 
. and a reliable motion estimation algorithm is yet to be seen. In the context of the present invention, it is recognized 

40 herein that, since no single cut detection algorithm can deliver 100\% recall rate, a cut browsing apparatus is required 
to provide access to misdetected shots. This browsing apparatus could also be used to identify gradual shot transitions. 
How will the browsing apparatus provide information to users without the need to play back the entire video is the most 
challenging task in designing such an apparatus. 

In order to provide information to a user who can then identify missed and misdetected shots, this browsing appa- 

45 ratus should have the following three charactehstics: 

First, it must present information different from what automatic cut detection algorithm extracts from the video. In 
addition, it shoukJ not try to present interpreted information to the user, since any such information could be the source 
of an error. Finally, the browsing apparatus should be fast and concise, providing another level of abstraction such tiiat 
the need to play back the entire video can be avokJed. 

50 In the context of the present invention, browsing apparatus comprises a video player, a video browser, and a cut 
browser. See Figure 4. The browsing apparatus forms tiie subject matter of a copending U.S. Patent Application Serial 
No. 08/576,271 , entitled CUT BROWSING AND EDITING APPARATUS, filed on even date herewith in the name of the 
present inventors, and whereof the disclosure is herein incorporated by reference. The video browser forms the subject 
matter of US. Patent Application Serial No. 08/221,225. entitled REPRESENTING CONTENTS OF A SINGLE VIDEO 

55 SHOT USING RFRAMES. in the name of /^man et al.. included, without its claims) in ttie present application as Appen- 
dix 1. 

These three components provide three different levels of abstraction. Further details will be presented further on, 
but briefly, the cut browser presents two aoss-section images, one is in a horizontal direction and the other in a vertical 
direction of the video volume. Each cross*section image is consti'ucted by sampling one row (or column) from every 
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image, and reducing the amount of information from a two-dimensional image to two one-dimensional image strips as 
shown in Figure 5, in which (a) shows two cross sections in video volumes and (b) shows a space-time image made up 
of two cross sections. 

The horizontal and vertical cross-section Images are combined, in the cut browser, into one image segmented into 
two bands according to the list of detected shots. This representation provides a level of abstraction just enough to 
reveal whether there is a missed or misdetected shot. 

This representation allows the user to easily search the content of the video, to decide if the cut detection method 
may have missed or misdetected a shot, and to create or eliminate cuts in preparation for a multimedia content. 

This particular set of three levels of abstraction is chosen for the following reasons. First, the original video must be 
inclLxled because it is the only raw information available. However, no one can afford to search information from the 
original video directly because video playback is a very time-consuming task. That is a reason to include the represent- 
ative frames (Rframes) of each detected shot. Any automatic cut detection method can be used to provide this level of 
abstraction. Second, since no cut detection method offers 100\% accuracy rate, there will always be missed or misde- 
tected shots. In order to provide the user a due so as to avoid being affected by the imperfection of any cut detection 
method (or any autonnatic method in the matter) in the search of desired video clip, additional information should be pro- 
vided that has the characteristics referred to above: 

— it must be completely different and independent from what the automatic cut detection algorithm extracts from the 
video 

— it should be raw information, but should not be the original video 

— it shouW be concise 

It is believed that the best choice is the cross-section image. A cross-section image is an image generated directly 
from the original video. The present implementation selects one in the horizontal and the other in the vertical direction 
of the video volume, although. In general, one can select any number of directions. The process of constructing a cross- 
section image is illustrated in Figure 5. The horizontal (vertical) cross-section image is constructed by sampling the mid- 
dle row (or the middle column) from every image and by collecting all samples over time. To provide a complete view of 
botfi cross-section images, they are combined into one image and Ihen the image is segmented into two bands accord- 
ing to the list of detected shots. 

This representation provides a level of abstraction which reveals the continuity of video frames. For example, if 
there is a missed shot which is the result of a visually abrupt cut or camera break, a clear discontinuous pattern is dis- 
cernible as shown in Figure 6(a). For cuts that are associated with gradual shot transitions such as dissolves, a blurred 
discontinuous pattern is apparent as shown in the bottom two pictures of Figure 6(b). 

Five components are utilized herein: a video player, a video browser, a cut browser, a Rframe generator, and a 
cross-section generator. Their inter-relationship is shown in Figure 4. In this figure, the video source can be either an 
analog or a conpressed digital video and the automatic cut detection nrxxjule can be any automatic cut detection 
methcxl as long as it outputs a list of shots in frame nunr^ers. The video player, the video browser, arxi the cut browser 
are three major components, which are supported by two other components, the Rframe generator and the cross-sec- 
tion generator. The activities in all components are conrpletely synchronized. 

The Rframe generator takes the shot list as well as the original video and produces thumbnail pictures that repre- 
sent each shot. Whenever the shot list is updated, it updates the list of thumbnail pictures. The video player plays back 
the original vkleo and accepts requests coming from either the video browser or the cut browser. The playback can start 
from any video frame. In order to locate the desired video frame, either the time code or the byte offset is used, depend- 
ing on whether the video source is an analog video or a digital compressed video respectively 

The video player also has a VCR-like user interface which inoplements functions such as pause, play slow motion, 
fast fbnward. and rewind, ft also provide a shot jump capability allowing the user to skip shots which are detected by the 
autormtic cut detection method. 

The video browser displays a list of thumbnail pictures representing each detected shot. It allows a user to quickly 
glance through a video to find the clip of interest The desired one will be found if it is among the list of detected shots. 
The video browser will properly position itself by taking information from either the video player or the cut browser. 
When a vkieo is being played or the cross-section image is clicked, the representative frame of the associated shot wilt 
be highlighted. When the cross-section image scrolls, the list of thumbnail pictures will scroll accordingly 

To provide information for missed or misdetected shots, the cut browser presents the aoss-section image gener- 
ated from the raw video. In the foregoing description, the types of patterns that could appear in the cross-section image 
when there is a shot change where described and explained. These patterns of cross-section images provide useful 
information for manual identification of missed or misdetected shots. When a user clicks on the cross-section image, 
the cut browser maps the nfX)use location to a frame number and sends the request to both the video player and the 
video browser The video will start playing from that frame arxJ the associated representative frame in the video browser 
will be highlighted. 
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The foregoing exemplary embodiment has been described in terms of two cross-section images; however, it is 
within the contemplation of the invention that a plurality of such images, which may be greater than two, can be utilized. 

The user interface of the current invention is shown in Figure 15. In operation, initially, three windows display on the 
screen. To search the content of the video, the user would examine each representative frame in the video browser to 
5 see whether it is the desired shot. When the list of thumbnail pictures is being saolled, the content of the cut browser 
will be updated to match the representative frame that is being highlighted. The user would clearly see if there is any 
missed shot between the current and the next shot. When the user is in doubt, they can simply click on the point which 
might start a missed shot in the cut browser to look at the content in the raw video. To decide if the cut detection method 
may have missed or misdetected a shot, the user could simply examine the cross-section image in the cut browser. If 
10 the user sees any pattern that might represent a missed or a misdetected shot, the user would then click on the point 
in the cut browser to look at the content of the raw video. 

The cut browser also provides the editing capability. This editing function is extremely useful for authoring content- 
indexed multimedia materials. It allows a user to break any shot into two, or merge any two shots into one. through a 
mouse or a button dick. When this event happens, the cross-section image will be reorganized into two bands and the 
15 shot list is updated. The Rframe generator will then update the list of representative frames. 

For example, when all video frames are in the same shot and all shots are correctly detected, a clear continuous 
pattern in this aoss-section image is seen, as explained atsove in reference to the example shown in Rgure 6. 

In addition, since these are also raw image data, no interpretation is presented and therefore no further enror could 
be introduced. 

20 The problem analysis described in the previous sections is incorporated in the present inventive solution to the cut 
detection problem which allows a partitioning of a video into a set of shots comprising visually abrupt cuts or camera 
breaks. Figure 7 shows a flow diagram of a cut detection algorithm in accordance with the invention. 

Unlike prior art methods which use either pixel-based or distribution-based difference metric, the present invention 
integrates both types of information in a new detection scheme. The invention includes an adaptive engine to provide 

25 the detection scheme witii the ability to automatically adjust to different types of video contents. The result from the 
automatic cut detection is presented to the user through a cut browsing apparatus, where three tasks can be performed: 
to search the content of the video, to decide if the cut detection metiKXl may have missed or misdetected a shot, and 
to create or eliminate cuts. 

The cut detection is herein formulated as a time sa'ies outiier detection problem. As shown in Figure 7. the system 
30 can take either uncompressed or conrpressed continuous vkieo as tiie input. Experimentally, MPEG-1 compressed 
video has been utilized and a sequence of DC images generated using tiie approach described in B.-L Yeo and B. Liu. 
' 'On tiie Extraction of DC Sequence from MPEG Compressed Video," Proc. of ICIP. October 1995. Both tiie pixel- 
based and the disfribution-based difference metric are tiien computed from ttiese DC images (difference metric collec- 
tion), while video contents are measured to provide up-to^ate test criteria (adaptation). Information from botii the dis- 
35 tribution-based and pixel-based difference metric are fused (detection), after taking into account the new test criteria 
(significance level) provided by the adaptation step. Finally, a list of scene change frame candidates is produced and 
filtered, resulting in the final scene change frame list (decision). 

DC intages are reduced images formed from the collection of scaled DC coeffk:ient8 in intra-coded DCT com- 
pressed video. Such images can be directiy reconstructed from JPEG or nrrovie-JPEG videos. For MPEG and H.261 
40 videos, fast reconstruction of DC images from motion-compensated pictures requires some levels of approximation. 
The technique described in B.-L. Yeo and B. Uu. * 'On the Extraction of DC Sequence from MPEG Compressed Video." 
Proc. of ICIP. October 1995 is herein applied to generate DC luminance images from every single frame of MPEG video, 
while the chrominance connponents are sinrply discarded. Briefly, technique exfacts the DC image corresponding to 
motion-compensated pictures directiy from the conpressed stream in the following manner. The technique first locates, 
45 for each block of interest, four original neighboring blocks from which the current block of interest is derived. It then 
approximate tiie DC term of each block of interest by tiie weighted sum of the DCs of the four original neighboring 
blocks, where tiie weights are simply tiie fraction of area occupied by the current block. 

It is noted, however, that ttiis technique is not necessary in tiie realization of tiie present invention. Alternatively, one 
can take uncompressed video frames and subsample ttiem to create the same size of images to feed into tiie apparatus 
50 in accordance witii the invention. 

In this step, botti pixel-based and disti-ibution-based difference metrics are collected as shown in Figure 8. which 
shows tiie sub-diagram of the difference meti-ic collection step. In accordance with the invention, tiie pixel-based differ- 
ence metric for each frame t is the sum of absolute frame difference. 
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where /fy represents the intensity value at pixel location {i,j) in frame 1 1 . 

The distribution-based difference metric is the traditional Kolmogorov-Smirnov test metric, except that one each is 
computed herein for the entire image as well as its four equally divided sub-regions. 

The purpose of this step is to provide the detection scheme with the ability to automatically adjust to different video 
5 contents. Figure 9 shows the sub-diagram of the adaptation step. As shown in Figure 9, statistics from each DC image 
and each pair of DC images are first collected to represent the cun-ent video content. They are the image contrast and 
the motion estimate. The image contrast estimate is computed based on a recursive scheme to suppress the influences 
of sudden lighting changes. 

'0 contrast f=(^-x)contrast f^^-^-za 

where sigma_{t-1} is the intensity variance estimate of the DC image at time M and x equals 0.6 in the present 
exemplary embodiment. 

Similarly, the motion estimate is computed as follows 

15 

motion f^i'i-i) motion f^^-k-xY, ^ ^ ' ) 

20 

where fq^ is the intensity value at pixel location (/.y) of the DC image at time M . N is the size of the image, 
and T equals 0.6 in the present exemplary embodiment. 

Both the image contrast and the motion estimates are then taken into a fuzzy engine to compute a new significance 
level for the hierarchical Kolmogorov-Smirnov test The fuzzy engine uses a quadratic membership function, where 

25 each contrast measurement is divided into four classes (low, middle, high, and extremely high), each motion estimate 
into three classes (slow, middle, and fast), and each significance level (/CJ into five classes (high, middle high, middle, 
middle low, and low), as shown in Figures 10. 11 and 12 illustrating the membership functions of the fuzzy engine in 
accordance with the invention. Figures 10,11, and 12 show the membership functions of contrast estimate, motion esti- 
mate, and which is proportional to significance level, respectively. 

30 Based on definitions of membership functions, the fuzzy rules are stated in a simple iFAIHEN format, where values 
are combined using AND (minimum) or OR (maximum) operations. All rules are listed in the following Table 1 . 

TABLE 1 



35 


IF contrast 


is low and motion is slow, THEN significance level is middle 




IF contrast 


is middle and motion is stow THEN significance level is middle low 




IF contrast 


is high and motion is slow THEN significance level is middle low 


40 


IF contrast 


is extremely high and motion is slow THEN significance level is low 




IF contrast 


is low and motion is fast THEN significance level is high 




IF contrast 


is middle and motion is fast THEN significance level is middle high 




IF contrast 


is high and motion is fast THEN significance level is middle 


45 


IF contrast 


is extremely high and motion is fast THEN significance level is middle low 




IF contrast 


is low and motion is middle THEN significance level is high 




IF contrast 


is middle and motion is middle THEN significance level is middle high 


SO 


IF contrast 


is high and motion is middle THEN significance level is middle high 




IF contrast 


is extremely high and motion is middle THEN significance level is middle 



55 Since all rules have given different values for the significance level, they must be resolved or defuzzified to yield a 
crisp final output value. Here, the centroid method of Zadeh et al. is used to find the center of gravity of the combined 
output shape, ensuring alt rules contributing to the final crisp result. For the centroid method, see L. A. Zadeh and J. 
Kacprzyk. Fuzzy Logic for the Management of Uncertainty, John Wiley & Sons. Inc.. 1 992. It is noted that, although the 
above fuzzy membership functions and rules serve the purpose of the invention, these membership functions nor fuzzy 
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rules have not been optimized. 

In this detection step, shown in the sub-diagram of Figure 13. both pixel-based and distribution-based difference 
metrics are independently tested and the results are fused to output scene change frame candidates. Pixel-based dif- 
ference metrics are treated as time series signals, where both visually abrupt cuts and the duplication of frames create 
5 observation outliers obeying Equation (8). 

In time trend estimation, a piecewise linear function is used to model the trend, as opposed to a more sophisticated 
model. Robust statistics also utilized to avoid the influence of type II observation outliers in trerxJ estimation (see the 
definitions previously presented). The difference metrics associated with frames from three frames before until three 
frames after the cunrent frame are sorted. Assume they are ordered, cf t< • * • ^7 . c/5 and de are used to fit a line. 
10 The interpolated/extrapolated value at the current frame is the predicted value from the trend analysis. 

This process is shown in Figure 14, which shows an example of trerxJ estimation (left) where a type I observation 
outiier is detected. Note that type 11 observation outliers exist in every other frame, wherein are plotted two sets of pixel- 
based difference metric (for example, sum of absolute difference against time. In both examples, frame 5 is the current 
frame, which is associated with a visually abrupt cut and a duplicated frame in the first and the second example respec- 
ts tively 

First, samples are collected from frame 2 to frame 8 and sorted out in ascending order. Four smallest values (sam- 
ples from frame 2. 4, 6, 8 and from frame 3, 5, 6. 7 in the first and tiie second example respectively) are tiien discarded. 
This is because in any 7 frame intent! the most type II observation outliers that one could have are four. In addition, 
the biggest value is also discarded (sample from frame 5 and from frame 4 in the first and the second example) because 
20 it may correspond to the visually abrupt cut as illustrated in Figure 1 4. 

Finally, samples from frame 3. 7 (in the first example), and from frame 2. 8 (in the second example) are chosen to 
fit a straight line. In the first example, the interpolated value at the cun-ent frame is much smaller than the actual value 
(positive deviation from the trend), whereas the interpolated value is nmjch bigger than the actual value (negative devi- 
ation from the trend) in the second case. The former will be defined as type I observation outiier in the context of cut 
25 detection and the tatter, type II observation outlie*. 

After the time trend is estimated, it is subtracted from the observed time series and then test the hypothesis. 
Hq :Az=q , against the alternative. : A^O in Equation (8). A practical solution is obtained by considering a simple cri- 
terion of tine form 



30 



40 



35 where A is the estimate of tiie displacement in the qth observation and a ^ is the estinriate of tiie variance of A . a ^ 
can be estimated by spectral methods (see U. Grenander and M. Rosenblatt, Statistical Analysis of Stationary Time 
Series, New York: Wiley, 1986, or. in accordance witfi tiie present exemplary embodiment of the invention, by substitut- 
ing usual estinriates of and a^^rai ,2, • • • ,p) into 



p 



Note that a^r=1 .2. • • • ,p) and are the same as in Equation (9). For performance reasons in the present exemplary 
embodiment of the invention, a zeroth th -order autoregressive model is assumed, and therefore. a^^O (r»1 ,2, • • • ,p) . 

Rnally, for all samples ttiat are identified as obsen^ation outiiers. if the predicted value from tiie trend is less tiian 
the observed, the frame is marked as a possible shot change frame candidate. 

Variance Estimate 



For tiiose observed sanrples which are neittier type I nor type II observation outliers, they are utilized to update the 
variance estimate using the following formula: 

55 

a ^a(1 -t)o ^+x( frame difference signal with trend removed) 

where t equals 0.6. 

As described in previous sections, tiie Kolmogorov-Smirnov test is concerned with the agreement of two sets of 
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observed values. The null hypothesis is that two samples come from populations with the same distribution function 
F(x) F(x) . It can be shown that 



where represents a constant depending on the level of significance a and is available in tabular form from most sta- 
tistics books. See, for exanf^le, J. D. Gibbons and S. Chakraborti. Nonparametric Statistical Inference. Marcel Dekker. 
Inc.. 1992. N is the size of the image the statistic is collected upon, , defined to be 

max}EDFfU)'EDFf,^(j)\, 

is constructed directly from the histogram of each DC image (see Equation (4). 

In each frame, a new significance level a is acquired from the adaptation step, a is used in the test invotved not only 
the entire image but also each of its four equally divided sdb images. Assume that n out of 4 regions show changes. 
The integrated count is defined as follows. 

KS^=n*2'^l if the statistics based on the entire image 

shows a change} (12) 

JCSt=i2*2 otherwise} 



This integrated count weights the change detected from any one of 4 regions twice more than the change detected 
from the entire image. 

In this stage, the integrated count produced by the Hierarchical Kolmogorov-Smirnov test is used to eliminate false 
positives produced by the time series outlier detection process. Denote the Integrated count produced for frame t by 
. The system eliminates a scene change frame candidate if 

When the system reaches this step, nrx)St false positives will have been eliminated. However, there could still be 
some due to flash light or extremely fast camera motion. A criterion can be set up to eliminate such false positives. The 
list of detected cut frames is stored in a delay memory as shown by the dashed box attached to the decision step in 
Figure 7 and examine the distance between any two cuts. Any cut that occurs in a distance less than a preset value (in 
the present exemplary embodiment, five frames) will be eliminated. 

r The proposed cut browsing apparatus contains three basic elements: the Video Player, the Video Browser, and the 
Cut Browser. These three elements provide video browsing at three different levels of abstraction. The video player and 
the cut browser both present only raw video material, although in the latter case, the presented materials are at a dif- 
ferent level of abstraction. However, the video browser presents interpreted information about shots in the video. 
Together, these three conrponents offer the best tool to any user who would like to perform the following three tasks: to 
search the content of the video, to decide if the cut detection method may have missed or misdetected a shot, and to 
create or eliminate cuts. 

The proposed cut browsing apparatus works as follows. When it is started, three windows display on the screen as 
shown in Figure 15, which shows the cut browsing apparatus. 

The video browser shows ad detected shots using representative frames (Rframes), while the cut browser displays 
the cross section image made from the entire video. The activities in all three windows are completely synchronized. 
For example, if the user use the mouse to ciid^ on any point in the cut browser, the associated representative frame in 
the video browser will be highlighted and the video player will start playing back the video from the same point. When- 
ever the video browser gets scrolled, the cross-section image scrolls accordingly 

To carry out the first function, namely to search the content of the video, the user would examine each representa- 
tive frame in the video browser to see whether it is the desired shot. For every shot, the user would also examine the 
corresponding cross-section image in the cut browser to see if there is any pattern of missed shots. When any point is 
in d0Li>t. the user would click on the point in the cut browser to look at the raw video playback in the video player. 

To decide if the cut detection method may have missed or misdetected a shot, the user could simply examine the 
cross-section image in the cut browser. If there is any pattern of missed or misdetected shots, the user would then click 
on the point in the cut browser to look at the raw video playback in the video player. 

The cut browser also provides the editing capability. A user can break any shot into two or merge any two shots into 
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one through a mouse or a button click When this event happens, the cross-section image will be segmented differently 
and the list of representative frames will get updated as well. 

The cut detection algorithm in accordance with the invention was applied to 24 video clips which included more 
than 149.000 MPEG-1 PAL and NTSC video frames. These videos come in a variety of types including movie trailers, 
cartoons, and sports videos, as listed in Table 2. following. Table 2 also shows the number of abrupt cuts (A/^) , the 
number of missed cuts (NJm}) (A/;^) , and the number of false detected cuts (A//) . These are experimental results from 
the automatic cut detection method in accordance with the invention. For those shots that are missed or misdetected, 
they all show clear patterns in cross-section images and therefore are easily identified manually using the present cut 
browsing apparatus. 

The present invention is believed to provide a recall and precision rate in the order of 99% and can detect from 
reading the video to the output of shot, change frames, close or near the video rate. The cut browsing apparatus has a 
cut browsing apparatus capable of substantially achieving 100% recall rate. The invention takes into account the non- 
stationary nature of the cut detection problem, and has been tested on a variety of videos and over 149.000 video 
frames. 

Clearly, the present Invention can be implemented in conjunction with the use of a suitably programmed digital com- 
puter, using programming techniques known in the art. 

White the present invention has been desaibed by way of exemplary embodiments, these are not intended to be 
limiting and various changes and modifications will be apparent to those skilled in the art to which it pertains. For exam- 
ple, various details can be rearranged In the screens of the browsers without departing from the spirit of the invention. 
Furthernrxjre. various alternative slicing planes for deriving cross section images can be substituted In a manner con- 
sistent witii the invention. These and like changes are intended to be within tiie scope and contemplation of the inven- 
tion which is defined by tiie claims following. 

Claims 

1 . A metiiod for detecting a cut in a video, comprising the steps of: 

(a) acquiring video images from a source; 

(b) deriving from said video images a pixel-based difference metric; 

(c) deriving from said video images a distribution-t>ased difference metric; 

(d) measuring video content of said video images to provide up-to-date test criteria; 

(e) combining said pixel-based difference metric and said distribution-based difference meti-ic. taking into 
account said up-to-date test criteria provided in step (d) so as to derive a scene change candidate signal; and 

(f) filtering said scene change candidate signal so as to generate a scene change frame list. 

2. A method for detecting a cut in a video in accordance with claim 1. wherein said pixel-based difference metric for 
each frame is the summation of an absolute frame difference representative of image intensity value at selected 
pixel locations in a frame. 

3. . A method for detecting a cut in a video in accordance with claim 1. wherein said pix^-based difference metric for 

each frame t is the surn of an absolute frame difference, 

if 

where represents the intensity value at pixel location {ij) in frame t . 

4. A method for detecting a cut in a video in accordance with claim 1 . wherein each image is divided into a number of 
sub-regions and wherein said distributksn-based difference metric is a KolnfK)gorov-Smirnov test metric, except that 
one each is computed herein for the entire image as well as its sub-regions. 

5. A method for detecting a cut in a video in accordance with claim 1 . wherein each image is equally divided into four 
sub-regior^ and wherein said distribution-based difference metric is a Kdmogorov-Smirnov test metiic. except that 
one each is computed herein for the entire image as well as said four equally divided sub-regions. 

6. A method for detecting a cut in a video in accordance witii claim 1 . wherein said step of measuring video content 
of said video images to provide up-to-date test aiteria provides said step (e) with the ability to automatically adjust 
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to different video contents. 

7. A method for detecting a cut in a video in accordance with claim 1 . wherein said video images are DC images rep- 
resented by the base frequency in the Discrete Cosine Transform coefficients characterizing the underlying image 

5 frame. 

8. A method for detecting a cut in a video in accordance with claim 1 , wherein said step of measuring video content 
of said video images to provide L^-to<late test criteria comprises collecting statistics from each DC image and 
each pair of DC images to represent cunrent video content, being an image contrast and the motion estimate. The 

w image contrast estimate is computed based on a recursive scheme to suppress the influences of sudden lighting 
changes. 

9. A method for detecting a cut in a video in accordance with claim 1 , wherein said collecting statistics from each DC 
image and each pair of DC images to represent current video content represent an image contrast estimate and a 

15 motion estimate, 

10. A method tor detecting a cut in a video in accordance with claim 1 , wherein said image corrtrast estimate is com- 
puted based on a recursive scheme to suppress the influences of sudden lighting changes. 

20 1 1 . A method tor detecting a cut in a video in accordance with claim 8, wherein said image contrast estimate is derived 
in accordance with the following: 

contrast f^^•x)cont^ast f.-i+io f,.^ 
25 where a^.^ is the intensity variance estimate of the DC image at time M . 

12. A method tor detecting a cut in a video in accordance with claim 8. wherein said image contrast estimate equals 

0.6. 

30 13. A method tor detecting a cut in a video in accordance with claim 8. wherein said motion estimate is computed as 
follows: 



35 



motion f={^•x)motion ^^^-t-x^ ( ^'^ ) 



N 

'I 



where /J ^ is the intensity value at pixel location (/,;) of the DC inriage at time N1 , A/ is the size of the image. 
40 14. A method tor detecting a cut in a video in accordance with claim 1 1 , wherein x equals 0.6. 

15. A method tor detecting a cut in a video in accordance with claim 8, wherein said image contrast and said motion 
estimates are applied to a fuzzy engine to compute a new significance level tor the hierarchical Kolmogorov-Smir- 
nov test, said fuzzy engine using a quadratic membership function, where each contrast measurement is divided 

45 into classes, from low to high, and each motion estimate is divided into classes, from slow to fast, and each signif- 
icance level is divided into classes from high to low. 

1 6. A method tor detecting a cut in a video in accordance with claim 15, wherein each contrast measurement is divided 
into four classes, low, middle, high, and extiremely high, each motion estimate into three classes, slow, middle, and 

50 fast, and each significance level into five classes, high, middle high, middle, middle low, and low. and wherein the 
fuzzy rules are stated in a simple IF/THEN format, where values are combined using AND (minimum) or OR (max- 
imum) operations. 

17. A method tor detecting a cut in a video in accordance with claim 16, including *a step of defuzzifying said fuzzy 
55 rules to yield a aisp final output value, by finding the center of gravity of the combined output shape, whereby all 

rules are ensured of contributing to the final crisp result. 

18. A method tor detecting a cut in a video in accordance with claim 1 , wherein in said step (e) of combining said pixel- 
based difference metric and said distribution-based difference metric, taking into account said up-to-date test crite* 
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ria provided in step (d) so as to derive a scene change candidate signal, said pixel-based difference metrics are 
treated as time series signals, where both visually abrupt cuts and duplication of frames create observation outliers. 

19. A method for detecting a cut in a video in accordance with claim 18. wherein said pixel-based difference metric is 
treated as a time series signal, where both visually abrupt cuts and the duplication of frames create observation 
outliers obeying the equation 

df=f{df,^dt.r^^,' * *,df)-^Uf if t^q 

^t-^^^t-r^t-r-^^^' ' othenwlse 
where t represents the time irxlex, A is the outlier, f(df,^df,^^^ , • • • . d^) models the trend in the series, and 

p 



20. Apparatus for detecting a cut in a video, comprising: 

20 

(a) means for acquiring video images from a source; 

(b) means for deriving from said video images a pixel-based difference metric; 

(c) means for deriving from said video images a distribution-based difference metric: 

(d) means for measuring video content of said video innages to provide up-to-date test criteria; 

2S (e) means for combining said pixel-based difference metric and said distribution-based difference metric, tak- 

ing into account said up-to-date test criteria provided in step (d) so as to derive a scene diange candidate sig- 
nal; and 

(f) means for filtering said scene change candidate signal so as to generate a scene change frame list. 

30 21 . Apparatus for detecting a cut in a video in accordance with daim 20. induding means for presenting two cross-sec- 
tion images of said video images, whereof one is a horizontal cross section image in a horizontal direction and the 
other is a vertical aoss section image In a vertical direction of the video volume. 

22. Apparatus for detecting a cut in a video in accordance with daim 21 , wherein each aoss-section image is con- 
35 structed by sampling one row (or colunrvi) from every image, and reducing the amount of information from a two- 
dimensional innage to two one<limensional irmge strips. 

23. Apparatus for detecting a cut in a video in accordance with daim 22. wherein said horizontal and vertical cross-sec- 
. tion images are combined Into one image segmented, into two bands according to a list of detected shots, whereby 

40 a level of abstraction is preserrted that is enough to reveal whether there is a missed or misdetected shot. 

24. Apparatus for detecting a cut in a video in accordance with claim 20, including means for presenting at least two 
cross-section images of said video images, whereof one is a horizontal cross section image in a horizontal direction 
and the other Is a vertical cross section image in a vertical direction of the video volume. 

45 

25. Apparatus for detecting a cut in a video in accordance with daim 24, wherein each cross-section image is con- 
structed by sampling one row (or column) from every image, and reducing the amount of information from a two- 
dimensional image to a plurality of one-dimensional ioTage strips. 

50 26. Apparatus for detecting a cut in a video in accordance with claim 25. wherein said at least two cross-section 
images, induding said horizorrtal and vertical cross-section images, are combined into one image segmented into 
a plurality of bands according to a list of detected shots, whereby a level of abstraction is presented that is enough 
to reveal whether there is a missed or misdetected shot. 

55 
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